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Abstract 

The main theme of this paper is a modification of the likelihood ratio test (LRT) for 
testing high dimensional covariance matrix. Recently, the correct asymptotic distribu¬ 
tion of the LRT for a large-dimensional case (the case p/n approaches to a constant 
7 G (0,1]) is specified by researchers. The correct procedure is named as corrected LRT. 
Despite of its correction, the corrected LRT is a function of sample eigenvalues that are 
suffered from redundant variability from high dimensionality and, subsequently, still 
does not have full power in differentiating hypotheses on the covariance matrix. In 
this paper, motivated by the successes of a linearly shrunken covariance matrix estima¬ 
tor (simply shrinkage estimator) in various applications, we propose a regularized LRT 
that uses, in defining the LRT, the shrinkage estimator instead of the sample covariance 
matrix. We compute the asymptotic distribution of the regularized LRT, when the true 
covariance matrix is the identity matrix and a spiked covariance matrix. The obtained 
asymptotic results have applications in testing various hypotheses on the covariance 
matrix. Here, we apply them to testing the identity of the true covariance matrix, 
which is a long standing problem in the literature, and show that the regularized LRT 
outperforms the corrected LRT, which is its non-regularized counterpart. In addition, 
we compare the power of the regularized LRT to those of recent non-likelihood based 
procedures. 

Keywords: Asymptotic normality; covariance matrix estimator; identity covariance 
matrix; high dimensional data; linear shrinkage estimator; linear spectral statistics; 
random matrix theory; regularized likelihood ratio test; spiked covariance matrix. 


1 Introduction 


High dimensional data are now prevalent everywhere that include genomic data in biology, 
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financial times series data in economics, and natural language processing data in machine 
learning and marketing. The traditional procedures that assume that sample size n is large 
and dimension p is hxed are not valid anymore for the analysis of high dimensional data. A 
signihcant amount of research are made to resolve the difficulty from the dimensionality of 
the data. 

This paper considers the inference problem of large scale covariance matrix whose dimen¬ 
sion p is large compared to the sample size n. To be specihc, we are interested in testing 
whether the covariance matrix equals to a given matrix; Tio ; S = Sq, where Sq can be set 
Ip without loss of generality. The likelihood ratio test (LRT) statistic for testing T-Lq : S = Ip 
is dehned by 


LRT = tr(Sn) - log |Sn| -p = ^ (h - logh - l), 


2=1 


where S„, is the unbiased and centered sample covariance matrix and k is the i—th largest 
eigenvalue of the sample covariance matrix. When p is hnite, LRT follows the chi-square 
distribution with degrees of freedom p{p+ l)/2 asymptotically. However, this does not hold 


when p increases. Its correct asymptotic distribution is computed by Bai et al. (2009) for the 
case p/n approaches 7 G (0,1) and both n and p increase. They further numerically show 
that their asymptotic normal distribution dehnes a valid procedure for testing 'Hq ; S = Ip. 


The results of Bai et ah (2009) are rehned by Jiang et ah (2012), which include the asymptotic 
null distribution for the case 7 = 1. Despite of the correction of the null distribution, the 
sample covariance is known to have redundant variability when p is large, and it still remains 
a general question that the LRT is asymptotically optimal for testing problem in the n,p 
large scheme. 

In this paper, it is shown that the corrected LRT can be further improved by introducing 
a linear shrinkage component. In detail, we consider a modihcation of the LRT, denoted by 
regularized LRT (rLRT), dehned by 


^ ^ p 

rLRT = tr(S) - log|S| - p = ^ (- 0 ^ - log-^* - l), (1) 

2=1 

where E is a regularized covariance matrix and V’i is the i—th largest eigenvalue of S. Here, 
we consider the regularization via linear shrinkage: 


E = AS„ + (1 - A)Ip. 


( 2 ) 
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We also occasionally notate rLRT(A) to emphasize the use of the value A. The linearly 
shrunken sample covariance matrix (simply shrinkage estimator) is known to reduce ex¬ 


pected estimation loss of the sample covariance matrix (Ledoit and Wolf, 2004). It is also 
successfully applied to many high-dimensional procedures to resolve the dimensionality prob¬ 


lem. For example, Schafer and Strimmer (2005) reconstruct a gene regulatory network from 


microarray gene expression data using the inverse of a regularized covariance matrix. Chen 


et ah (2011) propose a modihed Hotelling’s T^-statistic for testing high dimensional mean 


vectors and apply it to hnding differentially expressed gene sets. We are motivated by the 
success of above examples and inspect whether the power can be improved by the reduced 
variability via linear shrinkage. To the best of our knowledge, our work is the hrst time to 
apply the linear shrinkage to the covariance matrix testing problem itself. 

We derive the asymptotic distribution of the proposed rLRT(A) under two scenarios, 
(i) when S = Ip for the null distribution, and additionally (ii) when S = Sspike for power 


study. Here Hgpike means a covariance matrix from the spiked population model (Johnstone 


2001 ), roughly it is dehned as a covariance matrix whose eigenvalues are all I’s but some 


hnite nonunit ‘spike’. The spiked covariance matrix assumed here includes the well known 
compound symmetry matrix Scs(p) = Ip J- pJp, where Jp is the p x p matrix of ones. The 
main results show that rLRT(A) has normal distribution in asymptotic under both (i) and 
(ii); their asymptotic means are different but the variances are same. The main results are 
useful in testing various one sample covariance matrices. To be specihc, hrst, in testing 
T-Lo : = Ip, (i) provides the asymptotic null distribution of rLRT(A). Second, combining (i) 

and (ii) provides the asymptotic power for an arbitrary spiked alternative covariance matrix 
including Scs(p). Finally, the results with A = 1 provide various asymptotic distributions of 
the corrected LRT. Among these many applications, in this paper, we particularly focus on 


the LRT for testing T-Lq : E = R, which has long been studied by many researchers (Anderson 


2003 Bai et al., 2009 Chen et al.| 2010 Jiang et ah, 2012; Ledoit and Wolf, 2002). 


The paper is organized as follows. In Section we briehy review results of the random 
matrix theory that are essential to the asymptotic theory of the proposed rLRT. The results 
include the limit of empirical spectral distribution (ESD) of the sample covariance matrix 
and the central limit theorem (CLT) for linear spectral statistics (LSS). In Section]^ we 
formally dehne the rLRT, and prove the asymptotic normality of the rLRT when the true 
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p ^spike- In Section|^ the results developed in Sectionj^are applied 


covariance matrix S is Ip or S 
to testing T-Lq -.T, = Ip. Numerical study is provided to compare the powers of the LRT and 


other existing methods including the corrected LRT and other non-LRT tests by Ledoit and 


Wolf (2002) and Chen et ah (2010). In Section]^ we conclude the paper with discussions of 


several technical details of the rLRT, for example, close spiked eigenvalues. 


2 Random matrix theory 

In this section, some useful properties of linear spectral statistics of the sample covariance 
matrix are introduced. The true covariance matrix S is identity or that from a spiked 
population model. 

The following notation is used throughout the paper. Let M be a real-valued symmetric 
matrix of size p x p and aj{M.) be the j—th largest eigenvalue of the matrix M with natural 
labeling ap(M) < • • ■ < ai(M). The spectral distribution (SD) for M is dehned by 

1 ^ 

F^{t) := -^5a^.(M)(t), t e M, 

^ i=i 

where and 6o,{t) is a point mass function that can be also written, with notational abuse, 
as Sa(t) = I {a <t). Here, I {A) denotes the indicator function of a set A. 


2.1 Limiting spectral distribution of sample covariance matrix 

Let {zij}ij>i be an inhnite double array of independent and identically distributed real¬ 
valued random variables with Ezn = 0, Ezh = 1 and Ez^ = 3. Let Z„ = {%,i = 
1,2, ...,n,j = l,2,...,p} be the top-left n x p block of the inhnite double array. We 
assume that both n and p diverge and their ratio 7 ' ■ = pjn converges to a positive constant 


7 . The data matrix and the uncentered sample covariance matrix are X„ = 


1/2 


and 


S° = respectively, where {Sp,p = 1 , 2 ,...} is a sequence of p x p nonrandom 

symmetric matrices. Note that the fourth moment condition Ezf^ = 3 is used later on in 
Proposition [T] 


In the random matrix theory literature (Bai et ah, 2009; Bai and Silverstein, 2004), 
the limiting distribution of empirical SD is determined by both the limits of pjn 

and F^p(-). Specihcally, if iLp(-) := E^p{-) converges in distribution to H{-), a random 
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distribution function F^°{-) converges in distribution to a nonrandom distribution function, 
say with probability one. The dehnition of is given by its Stieltjes transform 

irF'^{z) that is the unique solution of the following system of equations: 


m 


= -rrF^^U) + 

7 7^ 


= 


m'l' 


H! 


7 


1 + trnr/’^{z) 


dH{t) 


( 3 ) 


( 4 ) 


on z E {z : Im(z) > 0 }. Generally, rrF’^{z) is known as the Stieltjes transform of the 
limiting SD of the so-called companion matrix for S°, which is dehned by S° := 4 x„X^. 
The density of F"‘'^ can be calculated from by the inversion formula, 

dF^'^ 1 


, -(a:) = lim — Im[m'’'’'^(2:)], x eR, z : Imfz) > 0. 

dx z^x 77r 


( 5 ) 


In the special case Sp = Ip, we have Hp{t) = 6 i{t) and the corresponding spectral 
distribution follows the Marcenko-Pastur law. To see this, note that the second equation 
of can be rewritten as 

1 7 

^ = “GTXZG + PT777XZG’ (6) 


mzA (^z) 1 -|- {z) ’ 

when H = 5 i. By the inversion formula, (|^ yields the probability density function of the 
Marcenko-Pastur distribution indexed by 7 when 0 < 7 < 1 , 

dF'y’^^ 


[X = 


dx '' ^ 2 Ti''fx 
where 0(7) := (1 — and 6(7) := (1 -|- 


\/{Kl) -x){x - 0(7)), 0(7) < a: < 6(7), 


( 7 ) 


2.2 Central limit theorem for linear spectral statistics 

Many multivariate statistical procedures are based on F®", the empirical SD of the centered 
sample covariance matrix S„. Consider is a family of functionals of eigenvalues that is also 
called as linear spectral statistics (LSS) or linear eigenvalue statistics: 

-E9('j)= / 3{x)dF^~(x) 

P j = l -J 

where 5^ is a function fulhlling certain complex-analytic conditions. 
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If the sample covariance matrix is uncentered, the central limit theorem (CLT) for the 


corresponding LSS is developed in Bai and Silverstein (2004) and Bai et ah (2009). The 


proposition below is adapted from Theorem 2.1 in Bai et ah (2009) and used in the asymptotic 


property of the proposed regularized LRT under the null. Note that the centering term of 
the CLT possesses a finite-dimensional proxy 


Proposition 1 (Bai et ah ( 2009[ )). LetTn{g) be the functional 

Tn{g) = P j g{x) (x) - . 


( 8 ) 


Suppose that the two functions gi and g 2 , are complex analytic on an open domain containing 
an closed interval [0(7), 6(7)] on the real axis. If = p/n —)■ 7 G (0,1) and Hp converges 
in distribution to then the vector {Tn{gi),Tn{g 2 )) converges in distribution to a bivariate 
normal distribution with mean 

1 




9iiah)) + 9iibil)) 


9iix) 


2tt 


C(7) 


for i = 1,2, and variance 


9i{zi)92{z2) 


V'47 - (a: - 1 - 7)' 


;dm{zi)dm{z2), 


zdx 


(9) 


( 10 ) 


27r^ / J {m(zi) — m{z2)fi 

where m = is defined in |^. The two contours in (10) are non-overlapping and 

containing [0(7), 6(7)], the support of F^'^^. 

The proposition requires the data to have known population mean vector (known as 
zero without loss of generality) and considers the non-centered sample covariance matrix 
S°. However, this is seldomly true in practice and it is common to use the “unbiased” and 
“centered” sample covariance matrix X„ = and = (l/fi)X)(X„, where h = n—\ 

and B„ = R — {l/n)In- The extension of Proposition to the centered sample covariance 


matrix is studied recently in Zheng et ah (2015). At the paper, the authors prove that, under 
the fourth moment assumptions (the assumption Ezfi = 3 in Section 2.1), Proposition is 
still valid for S„ if Tn{g) is redehned by 


Tn{g) =P [ g{x) d{F^-{x) - F^'’^-(x)}, 


( 11 ) 
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where h = n — 1 is the adjusted sample size and 7 ' = •pjn. This is named as “substitution 
principle”. 

The assumption H = 61 roughly implies that the spectrum of Sp is eventually concen¬ 
trated around one. One simple example is = L. Then the SD is Hp = Fh = 5 ^, which 


trivially converges to hi. In addition, we note that the spiked population model (Johnstone 


2001) has = hi as the limiting SD and is applicable to Proposition]^ In our settings, the 


spiked population model refers to the data whose covariance matrix Up has the following 
eigenvalue structure: 


Oj\ , . . . , Ui , Oj2^ • • • : ^2 ; • • • 5 * 

rifc p-K 


( 12 ) 


Then the SD Hp corresponding to Sp is 


H,(t) = 


P 


K 


P 




P 


(13) 


2 = 1 


where K = ni +.. . + nk is & hxed hnite integer, not depending on n so that p — K eigenvalues 
of unity eventually dominate corresponding Hp when p is large. Thus, the limiting SD 
remains unchanged as = hi. 

To study the power of the proposed regularized LRT, it is useful to study how to apply 
the spiked population model to Proposition]^ Although the limiting SD is simple (hi), the 
spiked population model has several difficulties with the use of Proposition ]Tj In the spiked 


model, its SD, Hp in (13), has masses at iP -|- 1 distinct points and ttH ’^p{z) is the solution 


to a polynomial equation of degree iP -|- 2 . A polynomial equation has an analytical solution 
only when its degree is less than or equal to 4. Therefore, if F > 3, we do not have an 
analytic form of To resolve this difficulty, recently, Wang et ah (2014) provides 


an approximation formula of J gdF'^'’^p in Proposition ]^ for the spiked population model. 
Such an approximation of g{mP''^p) at jg constructed based on the idea that rnP'’^p(z) 

and ptP'’^^{z) would be close enough if p is large. 


Proposition 2 (Wang et ah (2014)). Suppose that 7 ' < 1 and Hp is given by (13), with 
|ai — 1| > y/S' for all i = 1,2,K. If a complex-valued function g is analytic on an open 
domain containing the interval [ 0 ( 7 ), ^( 7 )] and k points p{ai) := i = 1,2,... ,iP 
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on the real axis, then J g{x)dF'^'’^p(x) in can be approximated by 
[ g{x)dF^'’^^{x) 


K 


1 I g{-— + \ ( JL _ nja-m 

2 'xip Jq m 1 + m I 7 'm (1 + aim) 


dm 


2 / — 




7 


K 

E 


(1 - ai)ni 


7 m 


2 nip Jc m 1 + (1 + a 4 m)(l + m) \m (1 + m)‘ 

K 


dm 


2 / — 


p 


g{x)dF'^''^^{x) + -'^nig{^p{ai)) + 0{^] 
^ i=l 


(14) 

(15) 

(16) 


where m = is defined in ^ by substituting 7 by 7 ', and C is a counterclockwise contour 


enclosing the interval 


-1 


-1 


i—VY’’ i+vE 


on the real axis. 


The above proposition is a special case of Theorem 2 in Wang et ah (2014) when all the Oj’s 


are distant spikes, i.e., [a, —1| > Note that Theorem 2 of Wang et ah (2014) allows close 


spikes Oi that is dehned by |aj —1| < ^ 7 . In this paper, we focus on the alternative hypothesis 
with distant spike in the power study. We hnally remark that the substitution principle is 
directly applicable to Wang et al.’s results, that is, one can approximate J g{x)dF^'’^p{x) in 
(pT| by Proposition]^ with 7 ' = p/n in the formula replaced by fi' = p/{n — 1). 


3 Main results 


In this section, the asymptotic results of the rLRT are presented. Here, the rLRT is dehned 
via the linear shrinkage estimator instead of the sample covariance matrix : 

rLRT(A) := tr(S) — log |S| — p, where S := AS„ + (1 — A)Ip. 

The shrinkage intensity A is hxed and chosen from (0,1). Dehne fi{x) = Ax + (1 — A) and 
g{x) = fiix) — log{V’(a^)} — 1- We consider rLRT(A) = p f g{x)dF^”{x), whose sample covari¬ 
ance matrix term is of the centered version. Then Proposition [T] along with the substitution 
principle yields the following results. 

Theorem 1. Let g{x) = fi{x) — log{'0(x)} — 1 and 'fi{x) = Ax -|- (1 — A) with fixed A G (0,1). 
Suppose that Ep = Ip. If fi' = p/n ^ E (0,1) with h = n — 1, then 

Tn{g)=TLRT{X)-p J g{x)dF^''^fix) 




















converges in distribution to the normal distribution with mean 


M = 


log -^(1 + X'jy - 1 


r- 27 r 


+ — / (log(l + A 7 - 2 Ay 7 COS 0)) ci 6 * (17) 


and variance 


, . 1 A , . ^ . A 7 , M — N 

*>(») = 7- A(i + 7 - A7) + — - logj , 


where 


M,N = M(A, 7 ),Ar(A, 7 ) : = 


■(1 - 2A + A 7 ) ± 7(1 - 2A + X-ff + 4A(1 - A) 


(18) 


(19) 


2(1 - A) 

The detailed proof of Theorem is given in Appendix 1^ 

Note that when A — 1, p(^) — — log(l — 7)/2 and v{g) — —27 — 2 log(l — 7 ) that are 


consistent with the resnlts in |Bai et ah (2009). To see this, observe that the integral in the 
mean fnnction log (l + A 7 — 2Xy/^ cos 9^ d9 approaches zero according to the dominate 
convergence theorem. In addition, it can be shown that M goes to —1/(1 — 7 ), and N goes 
to +00 as A —)• 1 nsing the approximation formnla \/a: + Ax ~ y/x + 

Next, consider the hnite-dimensional proxy 

j g{x)dF'^'’^^{x). 

From the density fnnction of Marcenko-Pastnr law ([^ and the fact that 1 = / xdF^'(x) = 

J ldF"/'’^^{x), 

j g{x)dF'^''^^{x) 

MY) 


la(^') 27rx7 


rb{f) 

/a(7') 


By snbstitnting x = 1 + 7 ' — 2^/y cos 9, we have an alternative representation of the integral 

2 


g{x)dF'^''^^{x) = -/ 

TT Jo 


1 + 7 ' — 2 A/y cos 6 ' 


( 20 ) 


It is remarked that the right-hand-side of (20) can be evalnated via the standard nnmerical 
integration techniqnes. 
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Theorem 1^ below establishes the asymptotic normality of the rLRT under the alternative 
hypothesis that the true covariance matrix from the spiked population model in Section 
It follows directly from Proposition 


Theorem 2. Let g{x) = — log{'0(a;)} — 1 and V’(a^) = Ax + (1 — A) with fixed A G (0,1). 

Suppose that Sp has SD Hp{t) = ^ os (13) with |aj — 1| > for 

all i = 1,2,K. If fi' = p/h —)■ 7 G (0,1) with h = n — 1, then 

rLRT(A) -p j g{x)dF3'’^^{x) N{M,v{g)), 

where p.{g),v{g) are defined in Theorem^and 

p j g{x)dF^'’^p{x) = j g{x)dF^'’^fix) + ^C{X,fi') + 0 

with 

1 ^ 1 K 

C(A,y) = A ■ — - A - — ^nilogV'{(p(ai)} 


2 = 1 


2=1 


K 


7 ' 


ylog(-M) + 


A 


K 

(1 - m 


Ui 


2=1 

1 


1 - ai 


K 


l + OiM/ K 


--T 


rii 


2 = 1 


afiM + 1) 


aa'M^ 


1 + OiM 1 — Oi 

7'M2 


M-N\ l + a*M (l + aiM)(M + l) 
1 


- 1 + 


(M + 1)2 


Oif ^ fi'iflMN + M + N) 


(M + l)(Ar + 1) 1^ 1 - a, (M + l)(Ar + 1) 

where (p(a) = a + and both M and N in are M = M{X,fi') and N = N{X,fi'). 


The proof of the above theorem is provided in Appendix 

Theorem]^ can be applied to Scs, the covariance matrix with compound symmetry, which 
is dehned by 


S 


cs 



/ l + fi/p fi/p fi/p 
fi/p 1 + fi/p 


Nv \ 


— Ip + 



( 21 ) 


V Np . 1 + /5/p / 

This matrix has a spiked eigenvalue structure; 1 + /3 for one eigenvalue and 1 for the other 
p — 1 eigenvalues. The corresponding SD is Hp{t) = + l5i+^(f). Theorem with 

K = 1, k = 1, ni = 1, and ai = 1 + fi gives the following corollary. 


10 





















Corollary 1. Let g{x) = 'ip{x) — log{V’( 2 :)} — 1 and 'ip{x) = Ax + (1 — A) with fixed A G (0,1). 
Suppose that Sp has SD Hp{t) = ^^5i{t) + ^6i+is{t) with fi > If fi' := p/n —)■ 7 G (0,1) 
where n = n — 1, then 


rLRT(A) -p j g{x)dF^'’^^{x) N{fr{g),v{g)), 

where p-{g),v{g) are defined in Theorem^ and 


g{x)dF'^'’^^{x) = ( 1 - - ) / gix)dF^'’^fix) + + 0{^ 


p 


7 'Ai 


P 




with 


C'(A,f) = A/3-log^/^{(^(l + /3)} + ^|^ + log(-^)| 

^l + {l + fi)M “ ( “ 1 + (1 + /3)m) 

A [ (1 + /3)(M + 1) f(l + /3)M^ ] 

(1-A)(M-Ar)| i + (i + /?)M (1 + (1 + /?)M)(M + 1) (M + l)2j’ 

where ip{a) = a + ^ and both M and N in C(A,j') are M = and N = N{X,fi'). 


In the corollary, the condition fi > ^7 means that the spiked eigenvalue 1 + /d is distant. 


4 Testing the identity covariance matrix 


The results in Section can be used for testing various hypotheses on one sample covariance 
matrix. In this section, we study the hnite-sample properties of the proposed rLRT in testing 
I-Lq : S = Ip. Additionally, we compare the power of the proposed rLRT and the following 
existing procedures in the literature. 


Ledoit and Wolf (2002) assume that p/n —)■ 7 G (0, cxd) and propose a statistic 

Ww = h.{(S-I)^}-^{h.(S)}' 


n 


The asymptotic distribution of Tlw is, if both n and p increase with pjn ^ ^ ^ (0, 00 ), 


uTlw - V 


converges in distribution to normal distribution with mean 1 and variance 4. 
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Bai et al. (2009) and Zheng et ah (2015) propose a corrected LRT for the cases where 


both n and p increases and p/n converges to 7 G (0,1). The corrected LRT statistic is 

cLRT = tr(S„) - log |Sn| - p. 

They show that 

T,^ = v{g)-^/^LLKT-p [ g{x)dF^'’^^{x)dx - p{g) 


converges in distribution to the standard normal distribution, where p{g) = — 
v{g) = — 21 og(l — 7 ) — 27 , and / g{x)dF'^'’^^{x)dx = 1 — log (^1 — y). 


2 ’ 


Finally, Chen et ah (2010) proposes to use the statistic 


where 


Tc — -^2.n -+ 1, 

p p 




Z=1 


p2 




p2 


p3 

i,j,k 


1 


Y, 

= n\/{n — r)\ 


and is the summation over different indices. The asymptotic theory suggests that, 
under the null, nTc converges in distribution to the normal distribution with mean 0 
and variance 4. 


4.1 Power comparison with the cLRT 

The asymptotic power curves of the cLRT and rLRT for the alternative hypothesis of the 
compound symmetry can be obtained using Corollary When Sp = Scs(/5/p) and 7 ' —)■ 7 , 
the probability of rejecting T-Lq : Yp = Ip at level p is 

1 


1 - $ 


<h (7) + 


v{9) 


g{x)dF^'’^^{x)-C{X,p') 


, /^ > \/7 


( 22 ) 
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where C{\, 7 ') is defined in Corollaryand $(■) denotes the cumulative distribution function 
of the standard normal distribution. 

The powers of the cLRT and rLRT with A = 0.4 and A = 0.7 are plotted in Figure Each 
panel of Figure compares the powers of the cLRT and rLRT for different sample size n. In 
each panel, the results of /3 < (close spike) are also included to study the performances 
when the the assumption of distant spike in Corollary is violated. The results in Figure 
suggest that Theorem would be applicable when there is a “close spike” eigenvalue. More 
detailed discussions are given in Section 

We find that in all cases the rLRT has higher empirical power than the cLRT for the 
chosen values of A = 0.4 and 0.7. We also find that the empirical power curve increases to 
1 less rapidly if A or 7 is closer to 1. In addition, although we do not report the details, 
the empirical curves converge fast and do have minor changes after n = 80 for the selected 
values of A and 7 . 

To understand the power gain due to the use of the rLRT better, we plot the empirical 
density of the rLRT and cLRT under the null and four alternative hypotheses (Al)-(A4) 
(used in Section 4.2) in Figure]^ The figure shows that (i) the variances of the rLRT are 
smaller than the cLRT under both null and alternative hypotheses and (ii) the distances 
between the null and the alternative distributions are larger in the rLRT than in the cLRT. 
Accordingly, the rLRT has larger power than the cLRT and we will see this is true regardless 
of the choice of n and p in the next section. 


4.2 Power comparison with other existing procedures 

In this section, we numerically compare the empirical sizes and powers of the rLRT statistic 


to other existing tests, namely the corrected LRT (cLRT) by Bai et ah (2009), the invariant 


test by Ledoit and Wolf (2002), and the non-parametric test by Chen et ah (2010). 

In this study, random samples of size n are generated from the p—dimensional multivariate 
normal distribution MVNp(0,E). The covariance matrix is set as S = Ip to obtain the 
empirical sizes. The sample size n is chosen as 20,40, 80, and 160, and, for each n, 7 = 7 ' = 
p/n is chosen as 0.2, 0.5, and 0.8. For example, p = 32, 80, and 128 are considered for n = 160 
in the simulation. We take 0.05 as as the level of significance. The shrinkage intensity A 
of rLRT(A) is selected from 0.2, 0.5, and 0.8 to investigate the effect of the magnitude of 
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Analytical 


n = 20 




n = 40 


n =80 




Figure 1: Analytic and empirical power curves for the rLRT and cLRT. 


the linear shrinkage. This means that we compare the empirical sizes of cLRT, rLRT(0.2), 
rLRT(0.5), and rLRT(0.8) under varying n and 7 . 

To compare the powers, we consider the following four alternatives: (Al) independent 
but heteroscedastic variance case S = diag( 2 , 2 , • • ■ , 2 , 1 , 1 , ■ ■ ■ , 1 ) where the number of 2 ’s is 
min{l, [ 0 . 2 pJ}, where [rj is the round-down of r; (A 2 ) independent with a single diverging 
spiked eigenvalue as S = diag(l -|- 0 . 2 p, 1 , 1 , • • ■ , 1 ); (A3) compound symmetry S = Scs( 0 . 2 ); 


and (A4) compound symmetry S = Scs(O.l) as defined in ( 21 ). Here, the compound sym¬ 


metry matrix Scs(p) has a single spiked eigenvalue 1 + p-p and p — 1 non-spiked eigenvalues 
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CLRT 


rLRT 




Figure 2: Empirical density functions of the rLRT and the cLRT under the null and four alter¬ 
native hypotheses when n = 40, p = 32, and A = 0.5: (Al) independent but heteroscedastic 
variance case S = diag( 2 , 2 , ■ ■ ■ , 2 , 1 , 1 , • ■ ■ , 1 ) where the number of 2 ’s is min{l, [ 0 . 2 pJ}, 
where [rj is the round-down of r; (A2) independent with a single diverging spiked eigen¬ 
value as S = diag(l -|- 0.2p, 1 , 1 , ■ ■ ■ , 1 ); (A3) compound symmetry S = Scs(0.2); and (A4) 
compound symmetry S = Scs(O.l) as dehned in ( 21 ). 


of 1. Thus, (A 2 ) and (A3) have the identical spectra. The sample size n and 7 are chosen 
to be the same as those of the null. 

The empirical sizes and powers of the listed methods are reported in Table [T} First, the 
empirical sizes of all the tests approach to the aimed level 0.05 as n increases. However, the 


size of Ledoit and Wolf (2002) shows slower convergence and more upward bias than the size 


of the other tests do in all cases we considered here. For this reason, the power of [Ledoit 


and Wolf (2002) after correcting the size (empirically) are also reported in Table where 


the cut off value is decided based on 100 , 000 simulated test statistics under the null for each 
simulation setting. Second, it can be seen that the emprical powers of the rLRTs are higher 
than those of the cLRT in all cases we considered. In addition, it is interesting to note that 
the power improvement is especially higher in the case 7 = 0.8 (when p is relatively largfe). 


Third, comparing to the tests of Chen et ah] (2010) and (the biased-corrected version of) 


Ledoit and Wolf (2002), the proposed rLRT(0.5) and rLRT(0.2) has higher empirical power 


in most of the cases. Finally, we remark that the computational cost of Chen et al. (2010) 
is at least 0{pn^) due to the fourth moment calculation so it is not suitable for data with 
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large n. In fact, to test the data with n = 500, Chen et ah (2010) takes tens of hours to 


hnish all computation (using C codes and Intel Core-i7 CPU), whereas the other tests only 
require seconds. 


5 Discussion 

We conclude the paper with a few additional issues of the proposed rLRT not fully discussed 
in the mainbody of the paper. 

First, we consider the case where n < p; equivalently, 7 > 1. In this case, the cLRT 
is not well-dehned because the logarithm term, log|S„| = contains some zero 

/j’s. On the other hand, the rLRT is still well-dehned as the corresponding logartithm term 
^•logV’(^i) remains positive even if A < 1. Since the CLT of the linear spectral statistics 
holds for 7 G [0, cxd) ( |Bai and SilversteTn , 2004), it is possible to extend Theorem and in 
this paper to the case where 7 G [1, 00 ). 

Second, we discuss the case with closely spiked eigenvalues. In the compound symmetry 


model of ( 21 ), the close spiked eigenvalues are those where the spike 1 - 1 - /? is smaller than 


1 -I- ^7- As shown in Figure]^ it appears that the power curves over the interval /3 G (0, y/j) 
could be obtained simply by extending the formula of Corollary 1 to (0, ^ 7 ). We remark 


that, however, if we follow Wang et ah (2014), the term log'(/'(I + /3) in Corollary 1 should 


be omitted when f3 G (0, a/ 7); leading to incoherence between the analytical and empirical 
power curves on (0, y/^)- 

Finally, the selection of shrinkage intensity A is still not well understood for hypothesis 
testing. As a reviewer points out, when A approaches 0, the rLRT becomes irrelevant with 
the alternative covariance matrix and its power is expected to be close to the size. Thus, an 
appropriate selection of A is important for good performance of the rLRT. The selection of A 


for the purpose of improved estimation is well studied in the literature, for example, Ledoit 


and Wolf (2004), Schafer and Strimmer (2005) and Warton (2008). However, our additional 


numerical study shows that such a choice of A for the estimation purpose cannot achieve a 
power gain in testing problem. The optimal selection of A for the hypothesis testing needs 
further research. 
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7 

n 

cLRT 

rLRT 

( 0 . 8 ) 

rLRT 

( 0 . 5 ) 

rLRT 

( 0 . 2 ) 

Chen 

LW 



20 

0.085 

0.078 

0.078 

0.083 

0.087 

0.089 


0.2 

40 

0.069 

0.066 

0.068 

0.073 

0.074 

0.083 



80 

0.062 

0.059 

0.061 

0.064 

0.070 

0.076 



20 

0.068 

0.059 

0.065 

0.072 

0.068 

0.104 

Null 

0.5 

40 

0.061 

0.055 

0.059 

0.064 

0.075 

0.098 



80 

0.056 

0.053 

0.056 

0.060 

0.068 

0.091 



20 

0.065 

0.055 

0.063 

0.068 

0.068 

0.125 


0.8 

40 

0.058 

0.052 

0.056 

0.059 

0.058 

0.120 



80 

0.055 

0.051 

0.054 

0.056 

0.064 

0.113 



20 

0.370 

0.444 

0.511 

0.550 

0.459 

0.517 ( 0 . 440 ) 


0.2 

40 

0.394 

0.489 

0.579 

0.632 

0.540 

0.596 ( 0 . 527 ) 

Al: 


80 

0.951 

0.984 

0.996 

0.998 

0.989 

0.991 ( 0 . 987 ) 

Indep. 


20 

0.265 

0.478 

0.604 

0.653 

0.478 

0.574 ( 0 . 471 ) 

with 

0.5 

40 

0.569 

0.880 

0.961 

0.977 

0.851 

0.897 ( 0 . 849 ) 

hetero. 


80 

0.967 

1.000 

1.000 

1.000 

1.000 

1.000 ( 0 . 999 ) 

variance 


20 

0.180 

0.580 

0.703 

0.741 

0.459 

0.629 ( 0 . 492 ) 


0.8 

40 

0.392 

0.959 

0.991 

0.995 

0.854 

0.924 ( 0 . 868 ) 



80 

0.859 

1.000 

1.000 

1.000 

0.999 

1.000 ( 0 . 999 ) 



20 

0.280 

0.341 

0.403 

0.440 

0.351 

0.408 ( 0 . 331 ) 

A 2 : 

0.2 

40 

0.724 

0.810 

0.871 

0.899 

0.853 

0.882 ( 0 . 848 ) 

Indep. 


80 

0.999 

1.000 

1.000 

1.000 

1.000 

1.000 ( 1 . 000 ) 

with 


20 

0.385 

0.596 

0.698 

0.739 

0.664 

0.729 ( 0 . 650 ) 

single 

0.5 

40 

0.890 

0.979 

0.993 

0.996 

0.989 

0.996 ( 0 . 994 ) 

diversing 


80 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 ( 1 . 000 ) 

spike 


20 

0.349 

0.765 

0.838 

0.864 

0.795 

0.879 ( 0 . 816 ) 

1 + 0 . 2 p 

0.8 

40 

0.867 

0.997 

0.999 

1.000 

0.999 

1.000 ( 0 . 999 ) 



80 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 ( 1 . 000 ) 



20 

0.280 

0.343 

0.404 

0.440 

0.382 

0.410 ( 0 . 333 ) 


0.2 

40 

0.726 

0.810 

0.870 

0.900 

0.835 

0.882 ( 0 . 848 ) 

A 3 : 


80 

0.999 

1.000 

1.000 

1.000 

1.000 

1.000 ( 1 . 000 ) 

Compound 


20 

0.387 

0.595 

0.695 

0.736 

0.669 

0.727 ( 0 . 649 ) 

symmetry 

0.5 

40 

0.888 

0.979 

0.992 

0.996 

0.995 

0.996 ( 0 . 993 ) 

with 


80 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 ( 1 . 000 ) 

p = 0.2 


20 

0.351 

0.764 

0.837 

0.862 

0.796 

0.877 ( 0 . 816 ) 


0.8 

40 

0.865 

0.997 

0.999 

1.000 

0.999 

1.000 ( 0 . 999 ) 



80 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 ( 1 . 000 ) 



20 

0.137 

0.162 

0.195 

0.217 

0.187 

0.201 ( 0 . 140 ) 


0.2 

40 

0.280 

0.358 

0.441 

0.491 

0.400 

0.454 ( 0 . 380 ) 

A 4 : 


80 

0.796 

0.885 

0.939 

0.962 

0.948 

0.953 ( 0 . 940 ) 

Compound 


20 

0.153 

0.252 

0.330 

0.368 

0.292 

0.365 ( 0 . 266 ) 

symmetry 

0.5 

40 

0.405 

0.659 

0.782 

0.830 

0.772 

0.828 ( 0 . 767 ) 

with 


80 

0.939 

0.996 

0.999 

1.000 

1.000 

1.000 ( 1 . 000 ) 

p = 0.1 


20 

0.139 

0.363 

0.452 

0.490 

0.403 

0.526 ( 0 . 392 ) 


0.8 

40 

0.376 

0.838 

0.911 

0.936 

0.912 

0.952 ( 0 . 920 ) 



80 

0.915 

1.000 

1.000 

1.000 

1.000 

1.000 ( 1 . 000 ) 


Table 1: Summary of sizes and powers over 100,000 replications except for [Chen et al. (2010) 
and 1,000 replications for Chen et al. (2010) (due to heavy computation). The empirically 
corrected powers of Ledoit and Wolf (2002) are reported in the parentheses. For each row, 
the maximum powers are highlighted in bold and the maximum powers among the four 
LRT-based tests are underlined. The powers for n = 160 are all equal to 1 and are removed 
from the table. 
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A Proof of the main theorems 

A.l Proof of Theorem [l] 


Theorem 1 is a direct consequence of Proposition Here, we calculate the integrals ([^ and 
(10) for 

g{z) = gi{z) = g 2 {z) = ij{z) - \og{i:{z)) - 1, 
where ^jJ{z) = A2; -1- (1 — A). Mean: Using Propositionand the substitution 

a; = 1 -I- 7 — 2i/7 cos 9 , 


Kdi) = 




f-b(-y) 


9i[x) 


2ti 


'a(z) 


- (x - 1 - 7)2 


dx 


A7 — log 1^(1 -|- A7)2 — 4A27 


— / (A7 — 2AU7 cos 6* — log(l-|-A7 — 2AU7 cos 0 )) (i6* 

27r io 

log7(l + A7P- 4y2_^^ r(>og(l + A7-2AV7cos9))d9 


log ^/(l -|- A7)2 — 1 


r - 27 r 


— / (log(lA7 - 2A7/7COS0)) d6*. 


Variance: We write mi := m(zi) and m2 = zzi(^2) for notational simplicity. We have 


<9) = f 9 {z 2 {m 2 )) j ( 23 ) 

To evaluate this integral with Cauchy’s formula, we need to identify the points of singularity 
in g{z{mi)). It can be seen that there is singularity when 'ip{z{mi)) = 0 . Rewrite 'ip{z{m)) 
as 


'ilj{z{mi)) 


A 



-jr\ 
mi + lj 


+ 1-A 


(1-A)(mi-M)(mi-V) 

mi(mi -|- 1) 
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where 


M,N = 


-(1 - 2A + A 7 ) ± v^(1-2A + A 7)2 + 4A(1-A) 


2(1-A) 

Then, M,N are points of singularity. Next, choose contours Ci and C 2 enclosing —1 and 
M , but not 0 and N, such that on the contours, the logarithm in g{z) is single-valued. In 
addition, Ci and C 2 are chosen so that they do not overlap. Applying integration by parts 
and Cauchy’s formula, we have 
g{z{mi)) 


drrii 


(mi — m2Y 
A A 7 


+ — + 


= 2 'Ki 


ml (mi -|- 1)2 mi mi -|- 1 mi — M mi — N j mi — m 2 

A 7 1 1 

+ 


dmi 


Then, 


v{9) = 


{m 2 + 1)^ m 2 -l- 1 m 2 — M 


(24) 


27r2 


9{z2{m2)) 


TT 


=- (p 9{z2{m2)) 


(mi — m2)^ 

A 7 1 


dmidm2 


+ 


{m 2 -l- 1)^ m 2 -l- 1 m 2 — M 


dm 2 


Here, 


9{z{m2)) 
{m 2 + 1)2 

A 


dm 2 


A 7 


+ — + 


ml {m 2 + 1)2 m 2 m 2 -l- 1 m 2 — M m 2 — N j m 2 + 1 

27ri 
1 + N 

Applying integration by parts, Cauchy’s formula, and Lemma we have 
9 {z{mi)) 


-dm 2 


(25) 


dmi 


(mi — m2)2 

A A 7 


+ — + 


1 


= 27ri 


ml (mi -|- 1)2 mi mi -|- 1 mi — M mi — N j mi — m 2 

A 7 1 1 

- i -^- 

(m 2 -|- 1)2 m 2 -|- 1 m 2 — M 


dmi 


(26) 


Then, 


v{9) = 


27r2 


9{z2{m2)) 


= -- (p 9 {z 2 {m 2 )) 


g{zi{mi)) 

(mi — m2Y 

A 7 1 


dmidm2 


+ 


{m 2 + 1)^ m 2 -l- 1 m 2 — M 


dm 2 . 
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Using integration by parts, 
g{z{m2)) 


(m 2 + 1)2 

A 


drri') 


X'j 


+ — + 


= 27ii A — 1 + 


ml {m 2 + 1)2 m 2 m 2 + 1 m 2 — M m 2 — N j m 2 + 1 
1 


dm 2 


1 + iV 


Applying Lemma [T] and Cauchy’s formula, we obtain 

g{z{m2)) 


and 


-dm 2 = -27ri {log(l - A)(l + N)} 

m 2 + 1 

g{z{m2)) f A , (1 —A)(M —A^) 

m 2 - M \ M M 


Combining the results (26) - (29), we have the desired result of variance. 


A.2 Proof of Theorem 


We compute 


g{x)dF^'’^^{x) = / {Ax — A — log(Ax + 1 — A)}(iF'’''’'^*’(x), 


where Hp is the SD of spiked population model, which is written as 


K 


Hpit) = ^^S,{t) + -J2n,6aM- 
P P 


Following the lines of Section 3 in Wang et al. (2014|), 


(27) 

(28) 

(29) 

□ 


[ (Ax - X)dF^'’^p{x) = - Uitti - + O(^). 

J p p 

The difficult part lies on the evaluation of integration of the logarithm-related term. Using 
the labeling in Proposition we rewrite it as 


log(Ax -I- 1 — X)dF'^'’^p{x) = (|14|) -I- (|15|) -|- (|16|) 


and calculate (14), (15) and (16) separately. In the remainder of the proof, we write m and 


7 instead of m and 7 ', respectively, for notational convenience. The terms (14) and (15) 
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involve contour integrals. Recall that the contour C on (14) and (15) encloses the closed 
interval 


l+\/7 


] on the real axis of the complex plane and has poles of {m = —1}, 


{m = M}, where M = M(A,7) is dehned in (19). It is easy to show that < M < 


and iV > 0 provided 7 G (0,1) and n is large, where N = iV(A, 7 ) is from (19). 


Following the lines of Section 3.3 of Wang et ah (2014), recall that A(—^ + p^) + (l —A) = 


m(m+l) 


( 14 ) 


-1 


log A(-+ 

m 


7 


r -o,-v . -) + (l-A) 

2Tiip Jq V m 1 + m 

-1 r log (^(-i7 + TTiA) + (1 - ^)) 
27rzp7 Jc m 

-1 / log( <‘-^7 -^> ) + log 

27rip7 Jc 


m 


27rzp7 Jc 


/ K Tiiafra 

\^fji (1 -|- aivny 


dm 




^ (1 + ttimf 

2=1 ^ ' 




^ 2 2 
riia^m j 


^ (1 + 

2 = 1 ^ ' 


dm 


= Ri + R 


2- 


m 




, — niofm'y , 

^ log - > 7-rvdm 

2 mp'j Jc ^ m + 1 ^ (1 + ttimy 


Here, 


and 


All = 


-K 


2 nip-i JC 

K 


log 


m 


M 


m + 1 


)dlog 


m 


. logm-dlog( "^ 
2 mp'^ Jc ^ m + 1 




27rip7 


■ (M + 1) ■ 


logm 


c (m + l)(m — M) 


dm 




= — log(-M), 

P7 


Ao — 


K 


rLiofm 


1 q — M . 

27iip Jc ^ m + 1 ' (1 + aimj 


dm 


2mp ^ ^ 


27rip 
= AI3-AI4, 


m — M 
m + 1 


) ■ 


1 + a^m (1 + ttimY 


dm 
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where 


and 


— 


2 'Kip 


m — M s riitti , 

-)- dm 

m + 1 1 + a^m 

~ 7 . N 

jd log(l + aim) 


m + 1 


-1 ^ r 

^°S (1 + aim) ■ d log ( 

-1 


m — M, 
m + 1 ' 




Hi log(l + aim) 
{m + l)(m — M) 


dm 


1 K ^ K 

^nilog(l - Oi)- '^ni log(l + aiM), 


P 


i=l 


Aa — 


2 'Kip ^ 


p ^ 

^ i=l 


m — M ^ TLiai 

dm 


) 


2 Kip + + aim)'^ 


1 


K 


in ^ 


Tli 


2Kip /c 1 + aim 

M + 1 ^ 


^ m + 1 ' 




rii 


2Kip Jc {1 + aim){m — M){m + 1) 


dm 


K 




)■ 


p ^—' '1 + aiM 1 — a. 

1=1 

Combiing the results of Ai + A 2 = Ai + A'i — we have 

K 


1 


K 


JmI) = — log(-M) +-^n^og [ ^ , M 1 

I I P7 p \l + aiM/ p \l + aiM 1 — 

1=1 1=1 




Ui 


Next, consider (15). Taking / = log 0 - 0 , we have 

A 


(log°V')'(-7 + T^) = 


m 1 + m^ Ax + (1 — A) 


Xm{m + 1) 


x =——+ -.2 

m 1+m 


( 1 -A)(m-M)(m-Ar)' 
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Then 


(15) 


where 


and 


K 


^ \\ogcnl,)'(-- + ^L_)y -’^)( 1 - ^ „ )dm 

^ m 1 + m‘^+ ttim 1 + m^^m (l + m)^^ 


2nip jQ 
1 


A ^ 

4tE 


Hi 


m{m + 1) 


(: 


di 


^ :)(i- 


7m 


2Tiip 1 ~ A J(^ {m — M){m — N)^l + aim 1 + m (1 + m)^ 

1 A ^ 


)dm 


27iip 1 — A 


Bi = 


—r nilBi — B2 — B3 + B4), 


Z=1 


ai{m + 1 ) 


= 2ni ■ 


c {m — M){m — N){1 + atm) 

ai{M + 1 ) 


dm 


Bo = 


{l + aiM){M - N)J' 

Qi'ym'^ 


= 2ni ■ 


c (m — M){m — N){1 + aim){m + 1) 
aaM^ 


dm 


+ 


ai'y 


S, = 


{M -N){l + aiM){M + l) {M + 1){N + l){l-ai)J^ 
^ 1 

dm = 2 Tii ■ 


c {m — M){m — N) 


M-N' 


Ba = 


7m 


7 dm 


c (m — M){m — N)(m + 1)^ 

7M2 7(2MiV + M + N) 


— Ott? • 

■ (M-iV)(M + l)2 (M +l)2(Ar +1)2 
Collecting the four terms, we have : 


(15) 


A 


K 


Ui 


ai{M + 1 ) 


ai'yM'^ 


M-N\ 1 + aiM {l + aiM){M + l) 
1 


1 + 


7M2 


(M + 1)2 


0*7 ^ 'y{2MN + M + N) 


(M + l)(iV + l) 1-a* (M + l)(iV + l) 


To obtain (16), note that the integration term f \og{'ip{x))dF^’^^ (x) is equal to — f {'0(a;) + 
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log('^(a;)) — l}dF'^'^'^{x) since M-P law satisfies / xdF'^’^^{x) = J ldF'^’^^{x). This gives 

K 


(16) = 


1 - —) / g{x)dF'^’^^{x) + -^ni\og'il){^{ai)] + 0{^) 
P J J P ^ 


2=1 


where ip{ai) = a* + 

Finally, combining the four results, we hnally obtain the centering term : 

f {^|J{x) — \og{^jJ{x)) — l]dF^’^p{x) 


{Xx - X)dF^'^^{x) - (14) - (15) - (16). 


= ( 1 --) f 9{x)dF''-‘^{x) + -C(\', + 0(X), 


P 


P 




where 


1 if ^ K 

Cn = X ■- X -—'^ni\og'<p{g){ai)] 


2 = 1 


K 


- log(-M) log 


2=1 

1 Q/i 


7 


A 


K 

(1 - A)A' g 


Ui 


i=l 

1 


1 + UiM J K 


1 


K 


- 


2=1 


1 + OjM I — tti 


ai{M + 1 ) 




M-N\ l + aiM (l + aiM)(M + l) 
1 


1 + 




0,7 ^ 7(2MiV + M + N) 


(M + l)(iV + 1) 1 - Oi (M + l)(Ar + 1) 


(M + 1)2 

□ 


A.3 Technical lemma 

Lemma 1. Let 2:^1 and zb he any two different fixed complex numbers. Then, (a) for any 
contour C enclosing and zb such that 

z- Za 


log 


z- Zb 


is single-valued on the contour, we have 


log- —dz = 0 , 


z - Za z - Zb 
(b) for any contour C enclosing za and zb , we have 

dz 


{z - za){z - Zb) 


= 0 
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Figure 3: Illustration of the path of the contour integral 


Proof, (a) Let C* be a contour enclosing C such that both C and C* are clockwise (anticlock¬ 
wise), and on C*, we have \z — za \ < \zb — za \ (see Figure 3). Then, D , the singly connected 
region between C and C* as indicated in Figure 3 contains no singularity. Therefore, the 
integral on C and C* are the same. Next, consider the power series expansion 


log 


z - Zb 



za - zb \^ 1 
z- za ) 3 


/ 2 :^ - Zb \ 

\z- Za ) 


3 



Such power series converges on C*. The desired result is a consequence of 


dz 


{z - ZaY 


0 


for alH = 2, 3,... . 

(b) Applying Cauchy’s formula, we have 


dz 


{z-ZA)iz-ZB) Za - Zb 



z- Za z- Zb } Za- Zb 


□ 
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